Fix reordering avro nullable types in MergeHiveSchemaWithAvro#140
Fix reordering avro nullable types in MergeHiveSchemaWithAvro#140rzhang10 wants to merge 1 commit intolinkedin:li-0.11.xfrom
Conversation
cb095b2 to
d58b362
Compare
|
LGTM, thanks for the PR @rzhang10. |
| return false; | ||
| } | ||
|
|
||
| public static int getNullIndex(Schema schema) { |
There was a problem hiding this comment.
nullIndexForOptionSchema
Iceberg does not use get in method names. nulls can also be present in complex schemas, so ensure that we add the OptionSchema constraint in the method name itself.
| if (AvroSchemaUtil.isOptionSchema(schema)) { | ||
| int nullIndex = AvroSchemaUtil.getNullIndex(schema); | ||
| if (defaultValue != null && !defaultValue.equals(JsonProperties.NULL_VALUE)) { | ||
| return Schema.createUnion(schema.getTypes().get(1 - nullIndex), Schema.create(Type.NULL)); |
There was a problem hiding this comment.
Hey @rzhang10 , I am having a hard time reasoning about these changes. Can you please include a description of how these changes help fix the issue. Can you also update the Java doc of this method with the new changes?
There was a problem hiding this comment.
Ok I understand why this is needed. Previously reorderOptionIfRequired only worked correctly when schema was an option schema who first element was null. This was fine because we never returned an option with null as the second element from the children visitors, but now we do. I think the description needs to be updated to reflect the new behavior.
| int nullIndex = 0; | ||
| if (partner != null && AvroSchemaUtil.isOptionSchema(partner)) { | ||
| nullIndex = AvroSchemaUtil.getNullIndex(partner); | ||
| } | ||
| Schema hivePrimitive = hivePrimitiveToAvro(primitive); | ||
| // if there was no matching Avro primitive, use the Hive primitive | ||
| Schema result = partner == null ? hivePrimitive : checkCompatibilityAndPromote(hivePrimitive, partner); | ||
| return shouldResultBeOptional ? AvroSchemaUtil.toOption(result) : result; | ||
| return shouldResultBeOptional ? AvroSchemaUtil.toOption(result, nullIndex == 1) : result; |
There was a problem hiding this comment.
This handling is incomplete, it only handles cases for primitives inside containers (e.g. list), but does not handle similar cases for nested lists, maps, structs, etc. E.g. this case will fail
@Test
public void shouldNotReorderListElementType1() {
String hive = "struct<fa:array<array<int>>>";
Schema avro =
SchemaBuilder.record("r1")
.fields()
.name("fa")
.type()
.nullable()
.array()
.items()
.nullable()
.array()
.items()
.nullable()
.intType()
.arrayDefault(Collections.singletonList(Arrays.asList(1, 2, 3)))
.endRecord();
assertSchema(avro, merge(hive, avro));
}
I think we need similar handling in other places where we use the shouldResultBeOptional pattern, e.g. in the struct, list and map methods. Might be good to refactor the common pattern into a separate method for reuse.
This patch fixes a bug where the avro nullable type orders (the order of null and the other type's order) is wrongly flipped, resulting in read failure.
The fix is demonstrated by the new unit test
shouldNotReorderListElementType